A Lexicon of French Quotation Verbs for Automatic Quotation Extraction

نویسندگان

  • Benoît Sagot
  • Laurence Danlos
  • Rosa Stern
چکیده

Quotation extraction is an important information extraction task, especially when dealing with news wires. Quotations can be found in various configurations. In this paper, we focus on direct quotations introduced by a parenthetical clause, headed by a “quotation verb”. Our study is based on a large French news wire corpus from the Agence France-Presse. We introduce and motivate an analysis at the discursive level of such quotations, which differs from the syntactic analyses generally proposed. We show how we enriched the Lefff syntactic lexicon so that it provides an account for quotation verbs heading a quotation parenthetical, especially those extracted from a news wire corpus. We also sketch how these lexical entries can be extended to the discursive level in order to model quotations introduced in a parenthetical clause in a complete way.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of Unmarked Quotations in Newspapers A Study Based on Direct Speech Extraction Systems

This paper presents work in progress to automatically extract quotation sentences from newspaper articles. The focus is the extraction and annotation of unmarked quotation sentences. A linguistic study shows that unmarked quotation sentences can be formalised into 16 patterns that can be used to develop an extraction grammar. The question of unmarked quotation boundaries identification is also ...

متن کامل

QUEMDISSE? Reported speech in Portuguese

This paper presents some work on direct and indirect speech in Portuguese using corpus-based methods: we report on a study whose aim was to identify (i) Portuguese verbs used to introduce reported speech and (ii) syntactic patterns used to convey reported speech, in order to enhance the performance of a quotation extraction system, dubbed QUEMDISSE?. In addition, (iii) we present a Portuguese c...

متن کامل

A Repository of Variation Patterns for Multiword Expressions

One of the crucial issues in the analysis and processing of MWEs is their internal variability. Indeed, the feature that mostly characterises MWEs is their fixedness at some level of linguistic analysis, be it morphology, syntax, or semantics. The morphological aspect is not trivial in languages which exhibit a rich morphology, such as Romance languages. The issue is relevant in at least three ...

متن کامل

AAE Talmbout: An Overlooked Verb of Quotation

While there has been a wealth of research on verbs of quotation in recent decades (Butters 1980, Blyth et al. 1990, Tagliamonte and Hudson 1999, Buchstaller 2001, Singler 2001, Waksler 2001, Rickford et al. 2007, Vandelanotte 2012), including studies focusing on African American English (AAE) (Cukor-Avila 2002, 2012), the discussion has focused on a handful of variables, most notably be like, g...

متن کامل

Quotation Extraction for Portuguese

Quotation extraction consists of identifying quotations and their authors. In this work, we present a Quotation Extraction system for Portuguese that is based on Entropy Guided Transformation Learning, a supervised Machine Learning algorithm. This is the first system that uses a Machine Learning approach for Portuguese. In order to train and evaluate the proposed system, we build the GLOBOQUOTE...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010